BTS 510 Lab 7

set.seed(12345)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Stat2Data)
theme_set(theme_classic(base_size = 16))

1 Learning objectives

  • Add predictors to a linear regression model
  • Interpret partial regression coefficients
  • Re-code predictors to answer your research questions
  • Compare different models using R^2_{change}

2 Data

  • ICU data from the Stat2Data package
    • ID: Patient ID code
    • Survive: 1 = patient survived to discharge or 0 = patient died
    • Age: Age (in years)
    • AgeGroup: 1 = young (under 50), 2 = middle (50-69), 3 = old (70+)
    • Sex: 1 = female or 0 = male
    • Infection: 1 = infection suspected or 0 = no infection
    • SysBP: Systolic blood pressure (in mm of Hg)
    • Pulse: Heart rate (beats per minute)
    • Emergency: 1 = emergency admission or 0 = elective admission

3 Research question

How do infection status (Infection) and admission type (Emergency) predict blood pressure?

4 Tasks

library(Stat2Data)
data(ICU)
  1. Conduct three linear regression models to address the above research questions.
  • Model 1: Infection status predicts blood pressure
  • Model 2: Admission type predicts blood pressure
  • Model 3: Infection status and admission type predict blood pressure
m1 <- lm(data = ICU, SysBP ~ Infection)
m2 <- lm(data = ICU, SysBP ~ Emergency)
m3 <- lm(data = ICU, SysBP ~ Infection + Emergency)
summary(m1)

Call:
lm(formula = SysBP ~ Infection, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-87.452 -18.672  -2.062  18.548 117.328 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  138.672      2.986  46.440  < 2e-16 ***
Infection    -15.220      4.608  -3.303  0.00113 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 32.16 on 198 degrees of freedom
Multiple R-squared:  0.05223,   Adjusted R-squared:  0.04744 
F-statistic: 10.91 on 1 and 198 DF,  p-value: 0.001134
summary(m2)

Call:
lm(formula = SysBP ~ Emergency, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-92.646 -18.646  -0.502  17.426 127.354 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  142.358      4.460  31.918  < 2e-16 ***
Emergency    -13.712      5.202  -2.636  0.00906 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 32.47 on 198 degrees of freedom
Multiple R-squared:  0.0339,    Adjusted R-squared:  0.02902 
F-statistic: 6.947 on 1 and 198 DF,  p-value: 0.009061
summary(m3)

Call:
lm(formula = SysBP ~ Infection + Emergency, data = ICU)

Residuals:
    Min      1Q  Median      3Q     Max 
-85.455 -18.694  -1.231  18.657 120.992 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  146.194      4.569  31.995  < 2e-16 ***
Infection    -13.553      4.630  -2.927  0.00382 ** 
Emergency    -11.186      5.178  -2.160  0.03196 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 31.87 on 197 degrees of freedom
Multiple R-squared:  0.07416,   Adjusted R-squared:  0.06476 
F-statistic:  7.89 on 2 and 197 DF,  p-value: 0.0005054
library(modelsummary)
modelsummary(list(m1, m2, m3))
(1) (2) (3)
(Intercept) 138.672 142.358 146.194
(2.986) (4.460) (4.569)
Infection -15.220 -13.553
(4.608) (4.630)
Emergency -13.712 -11.186
(5.202) (5.178)
Num.Obs. 200 200 200
R2 0.052 0.034 0.074
R2 Adj. 0.047 0.029 0.065
AIC 1959.9 1963.7 1957.2
BIC 1969.8 1973.6 1970.4
Log.Lik. -976.933 -978.849 -974.592
F 10.911 6.947 7.890
RMSE 32.00 32.31 31.63
cor(ICU[, c(6, 7, 9)])
           Infection      SysBP  Emergency
Infection  1.0000000 -0.2285386  0.1666485
SysBP     -0.2285386  1.0000000 -0.1841112
Emergency  0.1666485 -0.1841112  1.0000000
  1. Which is the best model?
  • Use the F-test for R^2_{change} and the AIC values to help you decide. Report the tests and values that you used to decide this.
anova(m3, m1)
Analysis of Variance Table

Model 1: SysBP ~ Infection + Emergency
Model 2: SysBP ~ Infection
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1    197 200057                              
2    198 204796 -1   -4739.2 4.6668 0.03196 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m3, m2)
Analysis of Variance Table

Model 1: SysBP ~ Infection + Emergency
Model 2: SysBP ~ Emergency
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1    197 200057                                
2    198 208758 -1   -8700.7 8.5677 0.003824 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(m1)
[1] 1959.866
AIC(m2)
[1] 1963.698
AIC(m3)
[1] 1957.184
  1. Report the results of the best model, including:
  • Intercept value, test statistic, p-value, interpretation (in words)
  • Slope value(s), test statistic(s), p-value(s), interpretation(s) (in words)
  • R^2 value, test statistic, p-value, interpretation (in words)
m3_coeff <- summary(m3)$coefficients
m3_coeff
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 146.19427   4.569246 31.995271 5.834311e-80
Infection   -13.55310   4.630268 -2.927066 3.824208e-03
Emergency   -11.18636   5.178192 -2.160282 3.195834e-02
Note

Look at the code for the section below to see how to programmatically (i.e., automatically) grab the coefficient values from the output and have them print in your document.

  • \hat{SysBP} = 146.194 + -13.553 \times Infection + -11.186 \times Emergency

  • The intercept (b_0) was 146.194, t(1) = 31.995, p < .0001. An individual who did not have an infection and had a non-emergency admission is expected to have blood pressure of 146.194.

  • The slope for Infection (b_1) was -13.553, t(1) = -2.927, p<.01. Holding Emergency constant, the difference between in blood pressure between an infection and a non-infection patient is -13.553 points. Patients with an infection have lower blood pressure.

  • The slope for Emergency (b_2) was -11.186, t(1) = -2.160, p<.05. Holding Infection constant, the difference between in blood pressure between an emergency admit and a non-emergency admit patient is -11.186 points. Patients with an emergency admission have lower blood pressure.

  1. Based on model 3, what are the predicted blood pressures for the 4 combinations of infection status and admission type? (i.e., infected non-emergency, non-infected non-emergency, infected emergency, non-infected emergency)
  • Non-infected non-emergency
summary(m3)$coefficients[1] + summary(m3)$coefficients[2] * 0 + summary(m3)$coefficients[3] * 0
[1] 146.1943
  • Infected non-emergency
summary(m3)$coefficients[1] + summary(m3)$coefficients[2] * 1 + summary(m3)$coefficients[3] * 0
[1] 132.6412
  • Non-infected emergency
summary(m3)$coefficients[1] + summary(m3)$coefficients[2] * 0 + summary(m3)$coefficients[3] * 1
[1] 135.0079
  • Infected emergency
summary(m3)$coefficients[1] + summary(m3)$coefficients[2] * 1 + summary(m3)$coefficients[3] * 1
[1] 121.4548
  1. Using the predicted and residual values, assess whether the assumptions are met for model 3. Include the plots and describe your conclusions based on them.
ICU$pred <- fitted(m3)
ICU$resi <- residuals(m3)
ggplot(data = ICU, 
       aes(x = Infection, y = resi)) + 
  geom_point(alpha = 0.3, 
             size = 2) +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at -0.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: neighborhood radius 1.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 0
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
-0.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 0
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01

ggplot(data = ICU, 
       aes(x = Emergency, y = resi)) + 
  geom_point(alpha = 0.3, 
             size = 2) +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at -0.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: neighborhood radius 1.005
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 0
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 1.01
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
-0.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
1.005
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 0
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 1.01

ggplot(data = ICU, 
       aes(x = pred, y = resi)) + 
  geom_point(alpha = 0.3, 
             size = 2) +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: pseudoinverse used at 121.33
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: neighborhood radius 13.677
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: reciprocal condition number 1.4154e-14
Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
: There are other near singularities as well. 183.69
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
121.33
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
13.677
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
number 1.4154e-14
Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
else if (is.data.frame(newdata))
as.matrix(model.frame(delete.response(terms(object)), : There are other near
singularities as well. 183.69

ggplot(data = ICU,
       aes(x = resi)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. In plain language, answer the research question: How do infection status and admission type predict blood pressure? (This should not be lengthy – just a couple sentences.)